Classification with Sparse Overlapping Groups
نویسندگان
چکیده
Binary logistic regression with a sparsity constraint on the solution plays a vital role in many high dimensional machine learning applications. In some cases, the features can be grouped together, so that entire subsets of features can be selected or zeroed out. In many applications, however, this can be very restrictive. In this paper, we are interested in a less restrictive form of structured sparse feature selection: we assume that while features can be grouped according to some notion of similarity, not all features in a group need be selected for the task at hand. This is sometimes referred to as a “sparse group” lasso procedure, and it allows for more flexibility than traditional group lasso methods. Our framework generalizes conventional sparse group lasso further by allowing for overlapping groups, an additional flexiblity that presents further challenges. The main contribution of this paper is a new procedure called Sparse Overlapping Sets (SOS) lasso, a convex optimization program that automatically selects similar features for learning in high dimensions. We establish consistency results for the SOSlasso for classification problems using the logistic regression setting, which specializes to results for the lasso and the group lasso, some known and some new. In particular, SOSlasso is motivated by multi-subject fMRI studies in which functional activity is classified using brain voxels as features, source localization problems in Magnetoencephalography (MEG), and analyzing gene activation patterns in microarray data analysis. Experiments with real and synthetic data demonstrate the advantages of SOSlasso compared to the lasso and group lasso.
منابع مشابه
Sparse Structured Principal Component Analysis and Model Learning for Classification and Quality Detection of Rice Grains
In scientific and commercial fields associated with modern agriculture, the categorization of different rice types and determination of its quality is very important. Various image processing algorithms are applied in recent years to detect different agricultural products. The problem of rice classification and quality detection in this paper is presented based on model learning concepts includ...
متن کاملRice Classification and Quality Detection Based on Sparse Coding Technique
Classification of various rice types and determination of its quality is a major issue in the scientific and commercial fields associated with modern agriculture. In recent years, various image processing techniques are used to identify different types of agricultural products. There are also various color and texture-based features in order to achieve the desired results in this area. In this ...
متن کاملFace Recognition using an Affine Sparse Coding approach
Sparse coding is an unsupervised method which learns a set of over-complete bases to represent data such as image and video. Sparse coding has increasing attraction for image classification applications in recent years. But in the cases where we have some similar images from different classes, such as face recognition applications, different images may be classified into the same class, and hen...
متن کاملVoice-based Age and Gender Recognition using Training Generative Sparse Model
Abstract: Gender recognition and age detection are important problems in telephone speech processing to investigate the identity of an individual using voice characteristics. In this paper a new gender and age recognition system is introduced based on generative incoherent models learned using sparse non-negative matrix factorization and atom correction post-processing method. Similar to genera...
متن کاملImage Classification via Sparse Representation and Subspace Alignment
Image representation is a crucial problem in image processing where there exist many low-level representations of image, i.e., SIFT, HOG and so on. But there is a missing link across low-level and high-level semantic representations. In fact, traditional machine learning approaches, e.g., non-negative matrix factorization, sparse representation and principle component analysis are employed to d...
متن کامل